Canadian 

inteiiectua) Property 
Office 

An Agency of 
industry Canada 



Office de ta Proprt,t, 

InteUectuetie 

du Canada 



<i ) CA 2 387 404 At 

(40) 13,04.2001 
{43) 13.04.aOO1 



(12) 

(21) 2 387 404 

(22) 29,00.2000 



{51) int. C!.?: 



G06T 9/00 



{85) 10.04.2002 

(86) PCT/KROC/01084 



(30) 


1939/43712 KR 11.10.1S99 


(72) 






KM, JAE GON (KR). 


(71) 






CHANG, HYUN SUNG (KR), 


ELECTRONICS AND TELECOMMUNiCATlONS 




KIM, m-ysioom (kr). 




RESEARCH iNSTlTUTE, 




HM, MUNCHURL (KR), 




161 Kajong-Dotig 








Yusong-Gu, DAEJEOM-SHJ, XX (KR). 


(74) 










SMART SlBIGGAR 



SCHEMA DE DESCRfPTIOH D£ RESUME V!DEO ET PROCEDE ET SYSTEME DE GENERATION DE 
DONNEES DE DESCRPTION DE RESUME TOEO POUR VUE D'ENSEMBLE ET EXPLORATfON 
EFFiCACES 

VfDEO SUMMARY DESCRtPTiON SCHEME AND METHOD AND SYSTEM OF \^DEO SUMMARY 
DESCRiPTION DATA GENERATiON FOR EFRCIENT OVERVfEW AND BROWSfNG 



(■57) 

The present invention relates to a video summary' 
description schorne fo' de&cnbiig vids>o nummary by 
meta data The video summary provfdcs o crvov/ 
functionaftty, whtch makes feasible to ui-idefstanci 
overafl contents of the original video v^mm short 
tme and navigation and brov ssinq 'u c'o aites 
v-Jhifh make teasibte to swarf-*i ihc ut.4> Gci video 
cjnlt.iit'^ t^tfiftpntl 'H'utdmj t> ! pieseut 
tr^Ncntnn \<r- t iei<3r~i!ca!Siiri""'3ry Description 'Scheme 
(DS) comptises at least on HtghltiihtLevel DS and 
selectivsiy conprises the SummafyThemeList DS The 
HightlightLevci DS describe highlight level an may 
ha\o zero or at least one iower HighitghtLevel DS The 
HtghiightLevei DS comprises one or more 
HighiightS-^gmtnt D^^ ^ ni..h is dc^i^rihing hghiight 
segment tiroffatio uf^n«ntutinq the c'eo surnniaiv of 
tho high ight !c I he HiyhtiqWSt.g'Tt<:.nt DS <.oroprib05. 
tno VideoCegtncitLrtcato D"^ 'or describing the tirrjc 
!nWm?ition k^* <><^ncspcndi! g segment intervai Also 
tit Hiyhlgiit&cqssKt^' DS tiidv 1-ompnv.t, the 
Imag^Locatof DS for describing the representative image 
mfoimatton of corresporrdtng segment, the SoundLocator 
DS for descrtbmg the representative sound mformation, 
and the AudioSegmentLocator DS for describing the 
audio segment information constituting the audio 
sumrnary. 
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(57) Abrege(suil:e)/AbslracHcoritiimied ): 

to ti^e present inv^ntfoa th^> Hierarcf^tcaiSjni:i-isry' Descr^pnon Schen* US) comirtses at least on HighitgfitLevet OS and 
selectively coiTipnsss the Summa^ i he-nsLisS DS. The HighEiigf-tv-evei L'S dti'^cist-i; -i ghiiyiii ieve! an may havs zero or at least 
on« ioW€-r H;ghi;ghti.,eve! OS The HfShiigritL.evef DS cciv^-s^s r-.-i^ rc f- ■-■! ■:inki'^-;>7e3!n>;nt DS i.'v'hich is csescnbiriQ highiifjht 
se-gnieni in'o-nviiion c-onsti-uting the- ^i&io sur-^i'ii;/ -j- ■-!? r^jf ii H:^ • liOhtSegiriers!. DS con^pcjes tne 

VideoSegrr-ientL.ocator DG for oes,>-;rib!r-;g i'ne time mfc-rn- nt o- .\--'0;--;:or. ; r.$: Sf-gT;-- -;t:r..-3; ";=50, f^e i itghlightGegsri'snt DS 
may comprise the Imagelocator DS for descfiDsng the rspresentativs smags information ot corresponding segrrtent. the 
SofjndLocator DS for descnbitis the representative souog laformaaon, and tiie AuoioSegmentLocator DS for descnoing tne 
audio segment information constituting tne audio summary. 
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(5?) Abstract: Itie present kwsntion 
fdatcs ti> a video summary' description 
scbame ibr ttescribiHg video suramafy bj' 
jneta dala. The video ,suBt««aiy provittes 
overvi<nv ftmctsonsilitj', whtcb mate 
fcasibte to tiaderstand overall coiitants of 
iiie oti^al vitieo withia short time and 
aavigalioR aad browsing fuuciionaiiiieft, 
which tijaks; fca-^iibk' lu sc!»v>i ihc desired 
video coiuents effioentiy. According to the 
present invention the HieTsichjcsiSuromaty 
iJesorijttiott Scheittss (DS> Cv>t8i>tises at 
least OB HighligiitLevel DS and seiectively 
comprises the S»nMnaj!y%etnifIJst 
DS. The HigbatghtLevel DS describe 
higbtight ifivet m. niay hav« zero or at 
teast otts lower Highlightl^vel DS. Hw 
HigJilightLevel DS c£>mpris&s oac or mor* 
f%hiight.'segme«!: DS which is (ie,«c!i{5irig 
bij^tigtit $eg»iettt Morttiation consiitutiag 
the video summ^ oS tbe hi^iMght kwL 
Use HighltghtSegawMt OS coirij»ns«s the 
VitleoSegnjfifstJLfxraUK- DS f«r describing the 
ti»!ij intorrnaiifin of correspcsndi!?^ segta 
inJerval, Also, *e HighliglsjSegm- 
DS i»ay cotTsprise ibe litiageLocatoi' 
for describing tlie i 
k^onttation at 

Uie SouiidLocatof DS for tisswribiag the 
► repffessitiiative soumt infomation, a«d the .^udioSkfgmetttlxicator DS for describing th& audio .segment jntbmaanon consttBiting 
" the audio swmmai}'. 
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VIDEO SUMM/\RY DESCRIPTION SCHEME AND METHOD AND SYSTEM OF 
VIDEO SUMMARY DESCRIPTION DATA GEKERATIOH FOR EFFICIENT 
OVERVIEW AND BROWSING 

TECHNICAL FIELD 

5 The present invention relates to a video summary description scheme for 

efficient video oveniew and browsing, and also relates to a method and system of video 
sumraaiy description generation to describe video suinmarv' according to tlve video sumniar>' 
description solienic. 

The tcchujcal fieids in wiiich tlie present invention is invoived are content 
10 bijsod s'tdeo indexing and browsing.' searching and suimnariKing video to the content based and 
then describing n. 

BACKGROUND OF THE INVENTION 

The format of summarizing video Jargely falls into dynamic sumttiar>' and 
static sunanar>'- '^'^^ '^'^^^ description scheme according to the present invention is for 
1 5 elficiently describing the djuamic summary and the static summary into the nnitication based 
descriptim schetne. 

Generaliy, because the existing video summary and description scheme provide 
simply the infonnation of video intemil which is itickided iti ihe video summary, the existing 
video summary and description scheme are iinuied to conveying overall videt> contenSs 
20 throiigh tire playing of the summary video. 

However, in many cases, tlie browsing for identifying and revisiting concerned 
parte through overview of overall contents is needed rather than only overview of overall 
contents through the summary video. 

Also, the existing video sumnwy provides ottly the video interval which is 
25 considered to be imporiant according to the criteria deierrniiied by the video snmman' 
provider Accordingly, if the criteria ol users and the \ddeo provider arc different from each 
other or users have special criteria, the users can not obtain video summary of their desires. 
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That is, although ihe existing suniitwry video pennits the users selectiag thy 
sutmBar>' vidcu with desired icvct by rrovidir.s: -r.eryt siuTtmory v!;ie<is. ii makes ihc 

seiecting exreni of the ust-ts to be hnxd^d tii.it the Liters can not scleci by Uu* cotitcrit? of the 
summary videos. 

5 The US patent 5,821,^45 eatitled '"Method and apparatus for video biovv-siag 

based oo content and structure" represents video in compact fomi and provides browsing 
functionaiity aceessing to the video with desired content through tiie representation. 

Xlowever, the patent is on the static stmimary based on the representative frame 
and although the existing static summary summarizes by u&ing the representative frame of the 
10 video shoU the representative frame of this patent provides only visual information 
representing the shot, the patent has iimitation on conveying ihe infonr.iUion using sunii^iary. 

As cotnpared witli the patent, ti^e video description scheme and browsing 
method mime the dyttamic summary based on the video segment. 

The video sutBanar>' description scheme was proposed by the MPEG -7 
15 Description Scheme {V0.5) announced iSO/IEC JTC1/SC2WG11 Out}mt 
Document No. N2S44 on My 1999. Because the scheme describes the inter\'al information of 
each video segment of dynamic smnmary video, in spite of providing basic functionalities 
describing dytiamic summary, the scheme has problesn in fbUowing aspects. 

First, there is the drawback that it can not provide access to original video from 
20 summary segments constituting the sutiunary video. That is. the users wanted to access to the 
original video to mderstand more detailed intbmiaiion on the bss\s of the smnmary contents 
and overview through summary video, however the existing scheme could not meet the need. 

Secondly, the existing scheme can not provide sufilcient audio summary 
description futictionalities. 
25 Ami fjnaily, there is the drawback that in the case of representing event based 

summary, the duplicate description and the complexity of searching is indispensable. 



2 
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SUMMARY OF THE iNVHNTlON 

An object of the present )n\onist \ li'^ :>h)\Riv; a hiciarchical \ k^cQ sunmnii^ 

description scheme, whicii cotTiprise-^ the represomatjvc iiami; micnruition ^nd ihe 

rqjresentative sound iatbrmation at each video inlervai which Is niciuded in the ir.mrn:u\ 
5 video md makes the usssr castomixed e%'eiit based summary providing users' seiection for the 

contents of the summary video and efficient browsing to be feasible, and a video summary 

description data generation method and system using the description schesne. 

In order to achieve the object, the HierarchicalSummary DS according to an 

executable example of the prcsciit invention comprises at least one IlighlightLevel DS which 
10 is describing higWight level, mtd the iiighlightLevcl DS comprises at least HfghiightSegraeiit 

DS which is describing higlilight segment information constituting the snmmaiy video of the 

highlight level 

f»referably, the Highligha-evd DS is composed of at least one lower level 
HighlighU-evel DSs. 

1 5 More preferably, the HighlightSegment DS comprises a VideoSegmend-ocator 

DS which is describing time information or video itself of said corresponding highlight 

segment. 

It is prefetabie that the ElighhghtSegnient DS further comprises ImageLocator 
DS which is describing the representative frame of said coixesponding liiglilight segment. 
20 It is more preferable that the HighlightSegment DS further comprises 

SoundLocator DS which is describing the representative sound information of said 
corresponding highlight segment. 

Preferably, the HighlightSegment DS further comprises ImageLocator DS 
which IS describmg the rcpresentam e Irame of satd corresponding highhght segment and 
25 Souidlocjto! OS '^hicn it. descnbmg the representative sound mformation of s,aii1 
correspotiding: nigl^light seijnient, 

Mor<. 'i ! *' i u ^1,1 r '>s dosLribt;-- InTse .tit' hij^Um n (n » ngo 

dtii.x ot th'^ ri^pioscniatno ftame ot Md^o lutciVvti con^sfxindnij to sati. cctK^'-njri ing 
highlight segment 
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Pre&rably, die l iighlightSegjncnt DS further coniprises AudioSegmenULocator 
DS which is dcscribin,ii the audio segmeni itiibmiadon constihitiag an audio summary of said 
cnrrcsponOtng lugb)ig!u segment 

More prc'tcrabi)', ihe AikhoSegmeniLocator DS describes tims iiiformadon or 
5 audio data oi tiic audio loterv^il of said corresponding higMight segment, 

II is preferable that the HterarchicalSuraniary DS indudes 
$unfijmar>'ComponetttUst describing and enumerating ail of the SunmiaryCompouentTypes 
which is included in the HierarchicalSummary DS. 

Also, it is preferable that the HieraichicaiSununary 0S includes 
10 SuKyTiaryThemeList DS which is enumerating the event or subject coiiiprised in the sumtnary 
and describing the ID and then describes event based summary and pemiits the users to 
browse tiie suiiimary video by the event or subject described in said Saminar>'ThemeList 

It is more preferable- thai the SuminatylhemeList DS includes arbitrary 
number of SnmmaryTh€anes as elements and said Suinmar>'Theme includes m attribute of id 
15 representing the corresponding event or subject, and tlie Summarj'Theme further inchidcs an 
attribute of parentlD which is to describe the id of the event or subject of the upper level 

Preferably, the Highbghti^evel DS includes an attribtJte of themelds describing 
said attribute of ids of common eveius or subjects if all of the HighlightSegments aiid 
HighlighfLevels which are constituting corresponding highlight level have common events or 
20 subjects. 

More preferably, the HighlighlSegmcnt DS includes an attribute of themelds 
describing said attribute of id and describes the event or subject of the corresponding highlight 
segment. 

Also, according to the present invention, a computer-readable recoRling 
25 medium where a IlierarchicalSianmary DS is stored therein is provided. Preferably, the 
HicnsrchicalSummais DS comprises at leasi one Highlight! xnxl DS \%iuch js describing 

highbebt icv<;l iniJ Uic ili^'Jiisti^ . > .\ iva^' otsc iligh]ij;rtSeL!mcni DS 

whu'h IS descnbing highiiglit segment mJonnattou consiiuam'.', \hv sutntnary \idco oi tliat the 
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highlight, k'vcl and the HighhghtSeejnent DS comprises VideoSeginentLocaior DS describing 
time infomtiadori or video i\-'SA ■.-■■^sa^i c<->; ■C'^.pon. In-ig highbghi segincnt 

Also, according lo 'he prcjacui invemion, .x mi.lbo<.i for generating video 
sunmutry description daia according u> vide<'> suiTunaiv descripiion schemt' by ifipotting 
5 original video is provided. The meiiiod includes tiie tbHowtag steps: video anal>';?.ing step 
which is producing video analysis result by inputting the original video and then atialyying the 
origtaai video; sitramaty rule defming step which is detining the sumniarj' rule for selecting 
suniniar>' video interval; swrtmary ^-ideo interval selecting step v^fhich i& constituting sumtnar>' 
video inten'a! mfoniiation by selecting the video inteiTal capable of summarising video 

10 contents from ihe origitial video by ijiputting said or'sginal \adeo analysis result and said 
suiiimary rule; and video summary describing step which is producing video suimnary 
description data according to the IiierarchicalSuniinar>' DS by inputuog the summary video 
interval information outptit by said sunnnary video interval seiccting step. 

Preferably, the video anaiyzing step comprises feaiure extracting step which is 

15 outputting the types of features and video time interval at which those features are detected by 
inputting the original video and extracting those features, event detecting step which is 
detecting key events included in the original video by inputting said t>i>es of features and 
video time inter%'al at which tliose features are detected; and episode deteciirig step which is 
detecting episode by dividing the original video ir^to story flow base on the basis of said 

20 detected event: 

Frefjerably, lite stjmmary rule defining step provides tije types of smnmary 
events, which are bases in selecting the siHiftmary video interval, after defining them to said 
%'ideo summary describing step. 

More preferably, the method further comprises repi-esentative frame extracting 
25 step which is providing the representative frame to said video simimary describing step by 
inputting said simtmajy video interval infonnation and cxtracttng representative frame. 

More preferably, the method further comprises representative sound extracting 
step which is providing the representative sound to said video summary describing step by 
inpntting said sujinmary video interval information and c.iiracting representative sound. 
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AIsOj according to the present mveniion, a computer-readable recording 
medium where a program is stored thereii-s is provided, llie prograiti executes the foliowiiig 
steps: feature extractiug step which is ou^puttmg tiie types of feakires and video time tnttTval 
at which those featares are detected; event detecting step which is detecting key events 
5 included in tiie original video by inptilting said types of features and said video time interval 
at which those features are detected; episode detecting step which is detecting episode by 
dividing the original video into story flow base on the basis of said detected key events; 
summary rule defining step which is defining the summary rule for selecting the summary 
video interval; sumroarj- video interval selecting step which is constituting summary video 

10 inten.'at tnfomiation by selecting the video inters'al capable of summarizing the video contents 
of the original video by inptuting said detected episode and said summary niie; and video 
summary describing step which is generating video smTiinary description data with 
HicrarchicatSummary DS by inputting the summary' video interval information output by said 
suinmary vidm interval selecting step, 

15 Also, according to the present invention, a system for generating video 

summary description data according to video summaTy descriptjon scheme by inputting 
original video is provided. The system includes video analyzing means tor outputting video 
analysis result by inputting original video and analyzing the original video, siunmary rule 
defining meanjs for defining the sunnnary rule for selecting the summary video interval, 

20 summary video intervai sciccfuig means for conatunting summary video interval information 
by selecting the video intervai capable of surmtiarizing the video contents of the original 
by inputting said video analysis result and said summary rule, and video vSummary describing 
means for generating video summary description data with HierarchicalSummary DS by 
inputting tlie simunary video interval information ou^ut by said surmnary video interval 

25 selecting means, 

Prefexably, tlie HierareliiealSununary DS comprises at least one HighlightLevel 
DS which is describing highlight level, the HighlightLevel DS comprises at least one 
HighiigbtSegment DS which is describing htglilight segment infomiadon conslituting the 
summarj' video of the highlight level and the HighiigbtSegment DS comprises 
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VideoSegnientLocator DS describing time intbmiation or video itself of said correspomiing 
higMight segment. 

Preferahi\ Uk \Hioo arahv^ai,^ mcins coinprsiCa tcatuio cxttactir.g n.o-ius tot 
outputting the types of i eat arcs a'ld \k1co time interval at which tho^e feamres ate dokxicd h\ 
5 inputting the origimU \k1co mid cxtracung tiiosc features, event detecting moans ioi di^tectmg 
key events included in the original video by inputting said tjpes of features and video time 
iftten^al at which those features are detected; and episode detecting means for derectijig 
episode by dividing the original video into storj' flow base on the basis of said detected event. 

More preferably, the summary rule defining means provides the types of 
10 summary events, which are bases in selecting the summary video interval, after defining them 
to said vide<,> summary describing tneans. 

It is preferable thai ihe system further con>prises reprcscntinive frame 
extracting means for providing the represerrtative frame to said video summary describing 
means by inputting said simnnary video interval information and extracting representative 
15 frame. 

It is more preferable that the system further comprises representative sound 
extracting means for providing the representalive sound to said video summary describing 
means by inputting said summar>' video interval information and extracting representative 
sound, 

20 Also, according lo Uie presen! iiivenrion, 3 computer-readable recording 

medium where a program is stored tlierein is provided. The program is for functioning feature 
extracting means for outputting the types of features and video time interval at which those 
features are detected, event detecting means for detecting key events included in the original 
video by inputting said types of features and said video time interval at which those features 

2.5 are detected, episode detecting means for detecting episode by dividing the original video into 
story Oovv base on the basis of said detected key events, summary rale defining means for 
defuiing the summary rule for selecting the saammy video interval, summary video interval 
selecting means for constituting smnmary video interval information by selecting the video 
interval capable of summarizing the video comcnts of the original video by inputting said 
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detected episode and said summan' rule, and video smnmary describing mcms for generatijig 
video siTOmary description data with HicrarchicalSummaiy DS by inputting the s^imtnary 
video interval tnformatioti output by said summar}' video interval selecting step. 

Also, a Video browsing system in a server/client circumstance according to the 
5 present invention is provided. The system includes a server which is equipped witii video 
sunanaty description data generation system which generates video summary description data 
on the basis of HierarchicalStinimary DS by inputting original video and links said original 
video and video summary description data, and a client which is brovt'sing and navigating 
video fay over^'iew of said original video and access to the original video of said server using 
10 said video summary description data. 

BRifc> DESCRiP'HON Ot 1 tU^ DRAWINGS 

The einbodimeiu.s of the present invention vs^ill be explained with reference to 
the accompanying drawings, in which: 

FIG. 1 is a hiock diagram illustrating a system tor generating video summary 
1 5 description data according to the description scheme of tlie present invention. 

FIG. 2 is a drawing that illustrates the data structure of the 
Hicrarchic«ISunnnar>' DS describing the video summarv'- description scheme according to the 
present invention in UML (UniOed Modeling Language). 

FIG. 3 is a coi«posiiionai dravsing of user interface of the tool for playing and 
20 browsing of the summary video inputting the video sumn^ary description data described by the 
same description scheme as FIG. 2. 

FRl, 4 is a compositional drawing for tiie ilow of tiie data and control for 
hierarchical browsing using the suitimary video of the present invention. 

DETAILED DESCRIPTION OF THE INVENTION 
25 The present invenricni v.ill be described in detail by way of a pi-el'erred 

embodiment with refeience to accompanying drawings, in which hlce reference numerals are 
used to identify tlie same or similar parts. 

8 
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FIG. 1 is a biock diagram illustrating a system for generating video summary 
descriptiors data according lo the descriptioi-s sciieriie of ihtf present invention, 

A.b dhistrafed it) F.K'.j. the appaiatiis for generating video descripliojo data 
accordiijg to the present invention is composed of a feature cxuacung part 101, an event 
5 detecting pan 102, an episode detecting part 103, a summary video inters'a!. sdecling part 1U4. 
a surratiar>' rule de&iing part 105, a representative frame extracting part !06, a repix:sentative 
sound extracting part 10? and a A'ideo stmunary describing part 108. 

The feature extracting pan 101 extracts necessary features to generate 
summurs' video by inputting the original video. The general features incltide shot boumkry, 
1 0 camera niotian, caption regioa fa.cs region and so on. 

In the step of extracting features, the types of features and video time iaierval 
at which those features are detected arc oiitput to the step of detecting event in the format of 
(types of features, feature serial number, time interval) by e.vtracting those tcatures. 

For example, in the case of camera motion, (camera zoom, 1, 100 - 150) 
\ 5 represents the information that the ftrst zoom of camera was detected in the 1 00 - 1 50 frame. 

The e\'ent detecting part 102 detects key events which are inchided in the 
original video. Because these event.? must represent the contents of the originaJ video well and 
are the references for generating summary video, these events are generally differently defined 
according to genre of the original video. 
20 These events either may represent higher tueanirug level or may be visual 

features wliich can directly infer higher meitning. For example, in the case of soccer video, 
goal, shoot, caption, replay and so on can be defined as events. 

The event detecting part 102 outputs the t^^ses of detected events and the time 
interval in the format of { types of events, event serial tiumber, time interval). For example, the 
25 event information indicating that the first goal occurred at between 200 and 300 frame is 
otitpul in the ftmnat ofCgoaL U 200 - 300) 

tne vj iv . i.-.'.Li r ' ' ' ot^ r-..t- ol thv. Jeticteu event divides the 
vkIw nno un or!h^^ie uita iaiges unit tnan an e\eni based on iho sirfs uuv \het detecting 
key e\ent$, an episode is detected while including accompanjcd events v\hich follow the key 
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event. For example, in the case of soccer video, goal and shoot cm be key events and bench 
scene, audiences scene, goal ceremoTiy sceae, repky of goal scene and so on compose 
accompaiiied events of the key events. 

That is, the episode is delected on the basis of the goal and shoot. 
5 The episode detecdotJ irtfomiation is output in the fomKJt of (episode nutDber, 

time interval, priority, feature shot, associated event information). Mereiti, the episode mimber 
is serial number of the episode and the time interval represents the time interval of the episode 
by the shot unit The priority represents the degree of importance of tiie episode. The feature 
shot represents the shot number inchiding the most important information owt of the shots 

10 comprising the episode and the associated event infonnation represents the event nuntber of 
the event related to the episode. For example, in tiie case of representing ttie episode detection 
information as (episode 1, 4 - 6, 1, 5, goal 1, caption 3), the information raeatis that the first 
episode includes 4 - 6th shot, the priority is the highest (1), the feature shot is fifth shot, and 
d\e associated events are tiie first goal and the third caption. 

15 TJie summary \'ideo interval selecting part 104 selects the video interval at 

which Uie contents of the original video can be suramarisscd well on tlte basis of the detected 
episode. The reference of selecting the interval is perfonned by the predefined summaiy mie 
of the suramary rule detlmrtg part U)5, 

The summary rule deOning part 105 define? tuie tor -x^iccuni; tlie summan 

20 interval and oulpuls control signal for selecting the :>ujnmaiy miav,)! iht. ■^uoimai v ntlc 
defining part 105 also outputs the types of summary events, which are base^, in bciecting tUc 
stimraary video inten'^ah to the video sunnnary describing past 108, 

The summar>' video interval selecting part 104 outputs the time infbmiation of 
the selected siumnary video intervals by frame units and outputs the types of event's 

25 corresponding to the video intervals, lltat is, the format of (100 - 200, goal), (500 ™ 700. 
shoot) and so on represent that the video segments selected as tbe summary video intervals are 
100 "~ 200 frame, 500 - 700 frame and so on and the event of eacfi segment is goal and .shoot 
respectively. As well, the information such as file naine can be output to facilitate the access 
of an additional s'ideo which is composed of only the summary video interval. 
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if the sumiBary video uilervai scJection is completed, the representative framc 
and the roprcienta.r, t ^^ ..-a di-j: cVj^ cJ iTor\ ihe lepresemauvt; fnmc c\{i:icuv<j i>m lOo 
jnU ihe icpECsctuiitise 'ipurui cxlractrig pasi 107 ic^sjjcotnely b> u^iiig thf s.u-nn'iaj^ \Kk>(> 
interval iiiformatiati, 

5 The represematn-e frame extracting part 106 outputs the image name jumper 

representing the sunmiary video interval or outputs the image data. 

The representative sound extracting part 107 outputs the sound data 
representing tlie stanmary video interval or outputs the sound time interval. 

The video summary describing part lOS describes the related information m 
1 0 order to make efficient summary and browsing functionalities to be feasible according to the 
Hierarchical Summary Description Scheme of the present invention shown in FIG. 2, 

The main information of tiie liierarchtcal SuiBmaj7 Description Scheme 
comprises the t>'pes of sunmiar>' events of the summary video, the time information describing 
each summaty video interval, the representative frame, the representative sound, and the event 
15 types in each interval. 

Hie video summary describing part 108 outputs the video summary description 
data according to the description scheme illustrated in FIG. 2. 

FIG. 2 is a drawing that ilUislrates; the data structnre of the 
HierarcliicalSummarv' DS describing the video summary description scheme according to the 
20 present invention in UMl. (Unified Modeling Language). 

The FlietarciucalSuimiiary DS 201 describing ihe video summarj' is composed 
of one or more HighlightLevel DS 202 and one or zero SummaryThemeList .DS 203. 

The vSummaryThemeList OS provides the functionality of the event based 
smnmary and browsing by emnneratively describing tiie information of subject or event 
25 constituting the summary. The HighlightLevel DS 202 is composed of the HigltlightSegment 
DSs 204 as many as the number of the video intervais constituting the summary video of that 
level and mo or several minibcr of Highlight L.c\ cl DS. 

The HighlightSegment DS describes the information corresponding to the 
interval of each sumtnar>- video. The HighlightSegment DS is composed of one 
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VideoSegmentLocator DS 205, zero or several ImageLocator DSs 20(>, zero or several 
SoundLocator DSs 207 and AiidioSegTT!entl:.ocavor 208. 

The foBowings give more detailed description about the HierarchicalSummary 

DS, 

5 The Hisrai«Mc3lSuRjmar>' DS has m attribute of SuRwwyConiponentList 

which obviously represents the stsmmary type, which i% comprised by the 
HierarchicalSiiimtiary DS. 

The SumraaryComponmtList is derived on the basis of the 
Summary Coirtpoiientr>i>e and describes by enumerating all comprised 
10 S umrnar y C'omponeiitTypes. 

In the Summary Componciittisl, there are five types such as keyFraittes, 
keyVideoClips, kcyAitdioCiips, kevEvents, and uncoTisiraiut. 

The keyF fames represents the key frame summary composed of representati ve 
frames. The keyVideoCHps represents the key video dip summary composed of key video 
15 ititervals' sets. The keyEvents represents the summary composed of the video interval 
corresponding to either the event or the subject. The keyAudioClips represents the key audio 
clip summary composed of representative audio intervals' sets. And, the imconstraint 
i-epreseats the types of summary defined by users except for said summaries. 

Also, in order to describe event based summaiy, tiie HierarchicalSununary DS 
20 might coinpnse the SunimarylheineList DS which is enumerating the event (or subject) 
comprised in the siimmary and describing the ID. 

The SummaryTbemeList has arbitrary number of SimiiBar>'Theme.s as element.s. 
Tbe S«mmaryTheme has an attribute of id of ID type and sefectiveiy has an attribute of 
parei>tld- 

25 The SummaryThemeList DS permits the users browsing the summary video 

from the viewpoint of each event or several subjects described in the SummarvThemeList. 
ihat the app icauon too! mrutin-^ dci..-'p;5.>n J...:r r\ik ihc fjsci- t.5 sei=;ct the .Jesired 
subject by parsmg the Msmmajvibonicl IDS dnd ptoxKinigthe nifomiaiion to the •.:>as. 
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At this time, in the case of emwierafing ttiese subjects into simple format, if the 
number of the subjects aie large, it might not easy io fmd out the subject desired by ttt« users. 

Accordingly, by represeivdng the subject as tree structure similar to ToC (Tabic 
of Content), the users effidently cm do browsing at each stibject after fmding out tlie desired 
5 subject 

In order to do so, tbe preseitt invention permits the attribute of parentM being 
selectively used in the SummaryTheme. The parentid means the tjpper element (upper subject) 
hi the tree structure. 

The HierarchicalSummary DvS of the present inveiitjon compriaes 
10 Highlightl^evd DSs and each HighlighlLevel DS comprises one or more I UghHghtSegnnent 
DS which corresponds to a video segment (or interval) constituting the summary video. 

The HighlightLevel DS has an attnbtite of themelds of IDREFS type. 

The themelds describes the subject mid event ul common to the children 
HighlightLevel DS of corresponding HighlightLevel DS or all HighhghtSegment DSs 
1 5 comprised in the HighlightLevel, a«d the id is described in said SoinniaryThemeList DS. 

The themelds can denote several evetits and, when doing event based summary, 
solve the problem that same id is imnecessarily repeated in all segnients constimting the level 
by having the themelds representtiig common subject type in tiie HigMightScgtnent 
constituting the level. 

20 The HighlightScgment DS comprises one Video ScgmentLocator DS and one 

or more ImageLoeator DS, zero or one SotmdLocator DS and zero or one 
AuthoSegmejitLocator DS. 

Herein, the VideoSegmentLocator DS describes the time infomjatiou or video 
itself of die video sepnent constimiing the summary video. The imageLoeator DS describes 

25 the image data information of the representative frame of the video segment. The 
Soundtocator DS describes the sound information represenltng the corresponding video 
segmeiu inicr\jl Ihi.- Audiosernor-J ovjic 1 .;v^c!->cn i:is£rv3! :irK' tn.tnnmtiotJ of the 
audio segment constitutmi; tho audio summarv or tlit. uudiu iufoinuition itself. 
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The HighligbtSegment DS has an attribute of themelds. The themelds 
(jc5cnbes u%\nv the id tlefincd in tiic ^umit\rs I ;cmc:' .si i^uh ccts os events described 

m SJiiiJ Suniiiu^jsThcmol !St OS Klatc^ lu Uic oonc^ponotnL* liii-hiyht segment 

Ihc themelds can denote moie than one cventb snd b\ .^^lo\^^l:J >>iitj highlight 
5 segmciu to ha\c s.t;vc:ntl subjects, it js an eftlcieiit technique of the present invention ^luch 
solving the problem of indispensable dupUcaiion of descriptions caused by describing A« 
video segment at each e\'ent (or subject) when using the existing method for event based 
summary. 

When describing tbe highlight segment conatiiiiting the sumitiary video, in a 

10 different way from tbe existing hienirchical summary descripdon scheme describit^g onSy the 
time information of the highlight video inten-'al, in order to describe the video inicrval 
mformation of each highlight segment, the representative frame information and the 
representative sound information, by placing the VideoSegmentLocator DS, tlie 
ImageSegmentl-ocator DS and the SonndLocator DS, the present invention makes tite 

15 overview thro«|^ the highhghi segment video and the navigation and browsing utilizing the 
representative frame and the representative sound of the segment to be feasible to eff icientiy 
utilize tiirough the introduction of the Highligh1,Scgn>ent DS for describing the highlight 
segment constituting the summary video. 

By placing the SoundLocator DS capable of describing the representative 

20 sound corresponding to the video interval m real instances through the characteristic soimd 
capable of representing the video interval, for example gun shot, ootcry, anchor's comment in 
soccer (for example, goal and shoot), actors' name in drama, specific word, etc., it is possible 
to do efficient browsing by roughly uuderstanding whether the interval is important interval 
containing the desired contents or what contents are contained in tbe interval within short time 

25 without playing the video interval. 

FIG. 3 is a compositional drawing of user interface of the tool for playing and 
browsing ot tiie .summary video inputting the video summary Gescription data de.scribed by the 
same description scheme as FIG. 2, 
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The vidm piayiiig part 301 plays the original video of tbe suwHiitry video 
accordiflg to the conhoi of (lie user. The original video represenlative fmm part 305 shows 
the representative frames of tiie origmai video shors. That is, it is composed of a series of 
images with reduced sizes. 
5 The representative frame of the original video shot is described not by the 

HierarcMcalSuinmary DS of the present invention but by additiona! description scheme and 
can be utilized when both the description data are provided along with the summary 
description data described by the HierarchicalSumniar>' DS of the present invemion. 

The user accesses to tiie oxigiuai video shot corresponding to the representative 
10 frame by clicking the representative frame. 

The summarj'' video level 0 representative frame pari and (he representative 
sound part 307 and the summai y video level I representative frame pan and tlie representative 
sound part 306 shows the frame and sound information representing each video iniervai of the 
sutnmary video level 0 and the snmniary \H[deo level 1 respectively. That is, it is composed of 
1 5 the icotiic images representing a series of the images and sounds with reduced sizes. 

If the user cliclcs the representative frame of the summary video representative 
frame part and the representative sound part, the user accesses to the original video interval 
corresponding to the representative frame. Herein, in the case of clicking the representative 
sound icon con'esfwnding to the representative frame of the sminnary video, the representative 
20 sound of tiie video ittterval is played. 

The summary video contTolling part 302 irtputs tl^e control for user selection to 
play the summary s'ideo. In the case of being provided with the mtilti level summary video, 
the user does overview atxd browsing by selecting the summary of the desired level through 
tbe level selecting part 303, llie event selecting part 3(J4 enujiierates the event and the subject 
25 provided by tbe Summary ThemeList and the user does overview and browsing by selecting 
the desired event. After all, this reahms (he summary of the user customi/.ation tvpe. 

Fia. 4 is a compositional, drawing for the flow of the data and control for 
hierarchical brosvsing using the summary video of the present inventioii. 
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The browsjrjg i$ perfomied by accessing the data for browsing vvith the method 
of FIG. 4 through the use of the user interface of FIG.3. The data for browsing arc the 
sommary video and the representative frame of the sumnaary video and the original video 406 
and the original video representative fiaine 405. 
5 The summary video is assmned to have two levels. Needless to say, the 

sumniary video may have more levels titan two. The summary video level 0 401 is what is 
suironariKed with shorter time than the snmntary video level 1 403. That is, the sunattan' 
video level I contains naore contents tlmn the suiTiinai-y video level 0. The summrny video 
level 0 representative frame 402 is the representative frame of the sumtnary video level 0 and 
10 the summary video level 1 representative frame 404 is the representative frai«e of the 
summary video level 1 . 

The summary video and the original %adeo are played through the video 
playing part 301 of FIG. 3. llie summary video level 0 representative irajne is displayed in the 
summary video level 0 repr<^entative irame and the representative sound part 306, tJie 
15 summary video level 1 representative frame is displayed in the snmniary video level 1 
representative frame md the representative sotmd part 307, and the original video 
representative frame is displayed in the original video representative frame part 305, 

The bterarchicai browsing method illustrated in FIG. 4 can have various types 
of hierarchical paths as the following example. 
20 Casel : (I) -(2) 

Case 2: (l)~ (3)--(5) 

Case 3 : (3)- (4) - (6) 

Case 4 : (7) -(5) 

Cases; (7) -(4) -(6) 
25 "Ihe overall browsing scheme is as follows. 

First, understand the overall cfHitetrts of the original video by watching the 
siimmafv video of the origir::ii \ iueo, I lerein, the sunuru-u y \ ideo may piiiy either li^e summary 
video level 0 or the siimriiary vuieo level 1. When more detailed browsing is wanted iiftci- 
watching tiie .^iiBiiman' video, the interested video interval i.s identified through the sununary 
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vidm represerrtative frame. If ihe scene which is desired to be exactly fbimd, is ideniiiicd in 
the sumiTiary video represeirtative frame, play It by directly accessing to the video interval of 
the original video to which the represemative frame is connected. And if the more detailed 
information is iieeded, the user may access to the desired original video either by 
5 understatiding the representative frame of the next level or by Merarchicaily undefstandu>g the 
contents of the representative frame of the original video. 

Although these hierarchical browsing techniques tnight take long time m 
browsing to access to the desired contents while the original video Is being played, the 
btwsing time is drastically reduced by direcdy accessing to the contents of the original video 
10 tiirough the hierarchical representative frame. 

The existing general video indexing and browsing techniques divide the 
origjiiai video in shot unit and access to tlie shot by perceiving the desired shot from the 
representative frame after constituting tiie representative frame representing each shot. 

In this case, because the number of the shots of the original video is large, lots 
15 of time and efforts are necessary to do browsitig the desired contents out of inattv 
representative frames. 

In the present invention, it is feasible to quickly access to the desired video by 
constituiing the hierarchical representative frame %vith the representative frame of the 
stimmary video, 

20 The case I is the case tlial plays the summary video level 0 and directly 

accesses to the origtna! video from the summary video level 0 representative frame. 

The case 2 is the case that plays tiie sunimary video level 0 and selects the most 
interested representative frame from the summary video level 0 representative frame and 
identifies the desired scene m the summary video level 1 representatis'e frame corresponding 

25 to tlte neighborhood of tlie representative frame to tmderstand more detailed inJbnnation 
before access to the original video and then accesses !o the original video. 

I tasc ^ !> The case fiat selects the mo^i uiteiesvcd i^n";.»(,ntit3V' U<-iim ^> 
obtain moie detaikd udoimaiton isi the ca!?v; mat the access fiom ihc sunuuai> Mdco c\tl i 
representatnc *rame to the original xideo is difficult m the case 2 and by the ortgiual \ideo 
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rcpressmativ'c frames} neighboring {he representative frame ioentiftes the desired scene 
then accesses lo Ihe nriginal video using the reprt'Semjiixe ii.Hnc of i!ie original ffame. 

The case 4 and case 5 aie the cases lhai start nt. the playing of ihe suinmaty 
video level 1 and the paths are similar to the above cases. 

5 When applied to the sesrver/dient circumstance, the present invmiioa can 

provide the system in which multiple clients access to one server and can do video overview 
and browsing. The original video is inputted to the server and the video sununary description 
data is produced on the basis of the hierarchical sumroaiy description scheme and the 
sumtnan' video description data generation system iinkiag said original vide<L> and the video 

10 summary description <kta is equipped. The client accesses to the server through the 
communication network, does overview of the video using the \'ideo summary dciscription 
data and does browsing and navigation of ttte video by accessing to the original video. 

Although, the prasent invention was described on the basis of preferably 
executable examples, titese executabie examples do not limit the present invention but 

1 5 exemplify> Also, it will be appreciated by those skilled in the art that changes and variations in 
the embodiments herein can be made without departing from the spirit and scope of the 
present inveiitioij as defined by the following claims. 
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CLAIMS 

What we claim: 

1. A HierarchicalSummao" Description Scheme (DS) tor describing a Yidm 
stmmiary, tbe HierarchicalSuimiary DS comprises at least one HigbHglitL-evel DS which is 

5 describing highlight level, wherein said Highlightlevel DS comprises at least one 
HighlightSegment DS which is describing highlight segment infonnation constituting the 
summary video of the highlight level. 

2. The HierarchicalSuiiimary DS according to claim I, wherein said 
HighlighlLevei DS is composed of at least oiie lower level i lighUghtLevel DSs. 

10 3. The HierarchicalSumraar>' DS according to claim 1, wherein said 

JlsghlightSegment DS comprises a VideoSegmentLocator DS which is describifig time 
information or video itself of said corresponding highlight segment, 4. The 
HierarchicalSmnmary DS accotxiing to claim 3, wherein said HighlightSegment DS further 
comprises ImageLocator DS which is describing tiie representative frame of said 

1 5 corresponding highlight segmeat. 

5, The HierarchicatSummary DS according to claim 3, wherein said 
HighlightSegmeni DS further comprises SoundLocator DS which is describing the 
representative sound intbmiation of said corresponding highlight segment. 

6. The HierarchicalSurarnary DS according to claim h wherein said 
20 Hit^hUuhuSegniciU DS further compii.se:, lin.vsLocrUot DS whici? is describing the 

rcpre5cn'-3U\-:.- 1'";-.^^ of a^iJOpiKAi.' hvj ' \c:jv^i-"'i'. anJ s.-iuiiJlLH-atrH j)S \shich is 
.;!(■'«.: ribitig tbc rejucsentiitrve sound tnfonnaiion ol iaal conc^ntMhiing highUgiit segment. 
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7. The HierarchicalSujumary DS according to claim 4, wherein said 

ImageLocator DS describes time iriforraation or image data of the representative frame of 
video iniervai corresponding to said corresponding higJilight segment. 

8. Tiie HierarcMcaiSuitunaiy DS according to ciaim 3, wherein said 
5 HighUghtSeginent DS ftirther comprises AudioSegmentLocator DS which is describing tii$ 

audio segment information constituting an audio summary of said corresponding MgMigbt 
segment. 

9. The HierarchicalSummary DS according to claim 8, wherein said 
AudioSegnsentLocaiOi DS describes time information or audio data of the audio interval of 

If) said corresponding higiiiight segment. 

10. The HierarehicaiSiimmar>' DS according to claim 1, wherein said 
HierarchicalSummary DS includes SuniraaryComponentList describing and exiumerating all 
of the SurtxmarjCoiitponentTjpes which is included in the HierarchicalSummary DS> 

11. The HierarchicalSumiirtary DS according to claim 10, wherein said 
15 SuuimyryConiponentrype includes keyFrames representing the key frame summary 

composed of representative frames, key V jdeoCUps represerUing the key \'ideo clip su.mm:ii-y 
composed of key video segment' sets, keyEvems represeniitig the smninary of the video 
interval conesponding to either the event or the subject, keyAudioOips representing the key 
audio clip summary composed of representative audio intervals' sets, and unconslraint 
20 representing the type of summary defined by users except for said summaries. 

12. Tlie HierarchicalSummary DS according to claim i, wherein said 
Hicia'ch!v.a!Sumtrar,' Ti^ ricLiCcb SummaryThcmcLii! DS which is enunierating the event or 
&Lih i.„t ton-3pnfev.'d n\ the sun^masy and descrihing the ID and iiien describ^^s event ba.scd 
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sunmiary and penBits the uset^ to browse the summary video by the event or subject described 
in said SuramaryThemeUst 

13. The liiefarchicalSumnian' DS according to claim 11, wherein said 
SuiiiHiaryThejiieList DS includes arbitraty^ nianber of Smttmaty I hemes as elements and said 

5 SuitimaryTherae includes an attribute of id representing the corresponding event or subject, 

14. The liierarchicalSummary DS accoixiing to claim 13, wherein said 
SumniaryTheme fartiier includes an attrib\ite of parcntID which is lo deseribe the M of the 
event or siibjeci of the upper level. 

15. The HierarchicalSmumary DS according to ciainj 13, wherein said 
10 HighlightLevel DS includes an attribute of themelds describing said attribute of ids of 

common e\'ents or subjects if all of the HighiightSegments and HighHghtLeveis which are 
constituting corresponding highlight level have common events or subjects, 

16. The HierarchicalSunanary DS according to claim 13, wherein said 

HighiightSegfiient DS includes an attribiue of themeids describing said atJribiite of id and 
15 describes the event or sul^jeci of d^e corresponciifig highiigiii segment 

17. A computer-readable recording medium where a flierarchicai Summary' DS 
is stored therein, die HicrarchicaiStmMiiaiy DS comprises at least one HigliligbtLevcl DS 
which is describing highlight level, wherein said HighlightLevei DS comprises at least one 
HighiightSegment DS which is describing highiight sepnent information constituting the 

20 simuiiar>' video of that the highlight level, wherein said Highlight egment DS comprises 
VideoSegmentLocator DS describing dme infomiation or video itself of said corresponding 
highlight segment 
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18. A method for generating video smBtnary description data accordiug to 
video surnmary desc-ripiian scheme by iupuiting original video, comprising: 

video analyzing step which is producing video analysis resnii by inputting the 
original video and then analyzing the original video; 
5 smnniary rule defining step which is defining the simimary rule for selecting 

sununary video interval ; 

summary video interval selecting step which is constituting sujtjmary video 
interval information by selecting the video itJter\'al capable of summarizing video contents 
from the original video by inputting said original video analysis result and said summary rule; 
10 and 

video sitnmiary describing step which is producing video suitiman' descnption 
data according to the HierarchicalSunmiary DS by inputting the summary video interval 
information ou^utby said sumnaary video interval selecting stei>. 

19. The meUiod for generating video summary description data according to 
15 ckitTi 18, wherein said HierarchicalSuxnmary DS comprises at least one HigliHghtlxvel DS 

which is describing highlight level, wherein said HighlighlLevel DS comprises at least 
HighlightSegmcnt DS which is describing highlight segment information conatiiuting the 
sumntary video of the highhght level, Vk4ierein said HigblightSegment DS comprises 
VideoSegnientLocator DS describing time information or video itself of said corresponding 
20 higWight segtnent. 

20. Use metliod for generating video sunuitary description data according to 
claim 18, wherein said video analyzing step comprises; 

feature extracting step which is outputting the tvpes of features and video time 
interval at wltich those features arc detected by inputting the originai video and extracting 
25 tliose features; 
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event detecting step which is detecting key events included in the original 
video by inputting said types of features and video time inten'al at which those features are 
detected; and 

episode detecting step which is detecting episode by divi^ng the original video 
5 into stor>' flow base on the basis of said detected event 

21, The metiiod for generating video sutttmary description data accordiug to 
claim 18, ^'herein said sumnwy rule defining step provides the ty-pes of summary events, 
which are bases in seleciing the summary video interval, after defining thein to said video 
summary describing step. 

iO 22. The method for generating video suxnmary description data according to 

claim 18, the method further comprises representative frame extracting step which is 
pro\Hiding tiie representative frame to said \'ideo summary describing step by inputting said 
summary video interval information and extracting representative frame. 

23, The method for generating video summaiy description data according to 
J 5 claim 18, the method further comprises representative soimd extracting step which is 

providing ti^e representative sound to said video suitmiary describing step fay inputting said 
stimniary video interval information and extractiiig representative sound. 

24, A computer-readable recording medium where a program is storeci therein, 
the program is to execute: 

20 feature extracting step which is output ting the types of features and video time 

interval at which those features are detected; 

event detecting step which is detecting key events included in the original 
video by inputting Raid types of features and said video time inten'a) at which those features 
are detected; 
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episode detecting step wkidi is detecttng q>isode by dividing tiie original video 
into story flow base on the basis of said detected key events; 

stmnnary rule deiuiuig step which is defining the summary rule for selecting 
the summary video interval; 
5 summary video imervai selecting step which is constituting suautnary video 

interval itifonnation by selecting the video interval capable of sunnnarizing the video contents 
of the original video by inputting said detected episode and said summary ruie; and 

video summary <tescribing step which is generating video summary description 
data with JlierarchicalSuinmary DS by inputting the summary video itttcrv^al infonnation 
J 0 output by said summary video inten'al selecting step. 

25, A system tor generating video sunnnary description data according to video 
summary description scheme by inputting original video, comprising; 

video analysing means for outpuUing video analysis result by inputting original 
video and analyzing the original video; 

summary rule defining means for defining the summary rule for selecting the 
summary video interval; 

summary video interval selecting means for constituting summary video 
interval information by selecting the video interval capable of summarizing the video contents 
of the original video by inputtnig said video analysis result and said siimmary rule; and 

video summary describing means for generating video summary description 
data with HierarchicalSummary DS by inputting the suntraary video interval information 
output by said siunmajy video interval selecting means. 

26. The system for generating video summary description data according to 
clant Vihcfsnn said HTcrarclncatSummars DS comprises at lea^t otk- i^ighhgiuLc>eI DS 

25 N^hjcn IS dt't.nHin^ hiuhbUi' L^cl, wliercm saui r^'J ^-f i "')^ (.ompui-.i .-it leas: one 
Htghl!gh$Se>inu-ar T)S ^h^ch is dtscribrng higMight bcgmctv tnionnation eonstitutmc the 
sunimai> \idco of the highhght ievcl wherein said IlighUghiSegment fXS conpnses 
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VideoSe^eiiiLocator OS describing time infomiation or video itself of said con espocding 
highlight segttt«it. 

27- The system for generating video summary description data according to 
claim 25, wherein said video analyzing means comprises: 
5 feature extracting means for outpatting the types of features and video tinie 

interval at whicli those features are detected by inputting the original video and extracting 
those features; 

event detecting mams ibr detecting key events included i-n the original video 
by inputting said types of fcanwes and video time ititerval at which those fealisres are detected: 
10 and 

episode detecting means for detecting episode by dividing the originai video 
into sU>ry flow base on the basis of said detected evmt. 

28. The system for generating video stanniary description data according to 
claim 25, wherein said summary mle defining means provides the types of summary events, 

15 which are bases in selecting the summx^ry video inlervaJ, after defining them to said video 
stanmary describing means. 

29. The system for generating video smiimary description data according to 
claim 25, tlie system further comprises representative frame extracting means for providing 
the representative frame to said video sunwnary describing means by inputting said summary 

20 video ii«er\'al information and extracting representative frame, 

30. The systeni for generating video summary description data according to 
claim 25, the system further comprises representative sound extracting means for providing 
the representative sound to said video summary describing means by inputting said summary 

25 video interval infonBation and extracting representative sotmd. 
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31. A compiuer-readabie recording medium where a program is stored therein, 
the program is for functioning; 

feature extractmg mcm$ for outputting uie types of features aad video time 
irjterval at which those features are detected; 
5 evi^t detecting means for detecting key events included in the original \ddeo 

by mputting said types of features and said video time interval at which those features are 
detected; 

episode detecting means for detecting episode by dividitjg the original video 

into story flow base on the basis of said detected k ey events; 
10 summary rule defining means for defining the summaiy rule for selecting the 

summary video inten-al; 

summary video inten-ai selecting means for constituting summary video 

interval mfomiation by selecting the video inicrval capable of summarizing the video contents 

of the original video by inputting said detected qjisode and said summary mie; and 
15 video summary describing means for generating video summary description 

data with HierarchicaiSummary DS by inputting the sununary video interval information 

output by said summary video interval selecting step. 



32. A Video browsing sy.stem in a scrver/ciient circnj-iistance, comprising; 

a server which is equipped with video suniiiwry description data generation 
20 system which generates video summaiy description data on the basis of HierarchicaiSujnmary 
DS by inputting original video md links said original video and video summary description 
data; and 

a client wMch is browsing and navigating video by overview of said original 
video and access to the original video of said server using said video stimmary description 
25 data. 
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FIG . 3 
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